feat: add distributed tracing with timing spans and trace context propagation#1543
Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates distributed tracing using OpenTelemetry, enabling the emission of wait and execution duration spans for PipelineRuns and the propagation of trace context across Snapshots and Releases. The review feedback identifies several logic and efficiency improvements, such as ensuring spans are only marked as emitted upon PipelineRun completion to prevent partial traces, optimizing TaskRun lookups using List operations instead of sequential Gets, handling malformed sampler arguments, and utilizing a composite propagator to maintain support for standard headers like Baggage.
0cb7da2 to
dc66ed5
Compare
|
/ok-to-test |
dc66ed5 to
3db0d33
Compare
|
Found and split out an unrelated regression while testing this PR end-to-end on CRC: #1555. Without it, Snapshot -> Release stalls on Application-model tenants. Not a blocker for this PR; merging order doesn't matter. |
ddc4a5a to
a5ecfdb
Compare
|
@ci-operator , integration-service under implementation of new component model which is one the core components in service |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1543 +/- ##
==========================================
+ Coverage 65.67% 73.58% +7.91%
==========================================
Files 65 68 +3
Lines 8923 9200 +277
==========================================
+ Hits 5860 6770 +910
+ Misses 2366 1750 -616
+ Partials 697 680 -17
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 31 files with indirect coverage changes Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
|
Where was this requested? Is there story related to those changes? Why and what for do we need this change? |
|
@jsztuka here's the ADR that's being implemented, part of which is this PR: https://github.com/konflux-ci/architecture/blob/main/ADR/0062-distributed-tracing.md |
a5ecfdb to
86e8bc3
Compare
|
Thanks for the review and for flagging the gaps. Both flows are now covered - Also folded in the |
a641fe6 to
f2c7151
Compare
|
/ok-to-test |
f2c7151 to
cd6cef3
Compare
|
🤖 Review · Started 1:40 PM UTC |
|
Resolved a rebase conflict in Also added a few tests around |
|
🤖 Finished Review · ✅ Success · Started 1:40 PM UTC · Completed 1:56 PM UTC |
cd6cef3 to
53aeaf6
Compare
|
🤖 Finished Review · ✅ Success · Started 7:49 PM UTC · Completed 8:03 PM UTC |
Propagate W3C trace context from build PipelineRuns to Snapshots, integration PipelineRuns, and Release CRs. Emit waitDuration and executeDuration timing spans on completed PipelineRuns across both ComponentGroup and Application model flows. Tracing is opt-in via OTEL_EXPORTER_OTLP_ENDPOINT; without it set, the service uses a noop tracer. The sampler family is selected via OTEL_TRACES_SAMPLER. Assisted-by: Claude Code Signed-off-by: Josiah England <jengland@redhat.com>
53aeaf6 to
659b591
Compare
|
@dirgim - rebased on main; conflicts were additive only (loader/tracing imports, condition consts) so no behavior change since your last /ok-to-test. Mind re-running it on the new HEAD so CI picks up? |
|
🤖 Finished Review · ✅ Success · Started 6:39 PM UTC · Completed 6:55 PM UTC |
|
/ok-to-test |
|
🤖 Finished Retro · ✅ Success · Started 9:00 PM UTC · Completed 9:09 PM UTC |
Retro Analysis: PR #1543 — Distributed TracingThis XXL PR (2100+ lines, 23 files) was open for 2 months with 10 review agent iterations, 5 gemini-code-assist suggestions, and human review from 4 maintainers. What went well
What could go better
Proposal filed
Proposals filed
|
Propagate W3C trace context from build PipelineRuns through Snapshots,
integration PipelineRuns, and Release CRs. Emit waitDuration and
executeDuration timing spans on PipelineRun completion across both
ComponentGroup and Application model flows. When a Snapshot is skipped
because a newer one for the same Application has already released,
emit a supersede waitDuration so the trace distinguishes deliberate
dedup from a broken chain.
Annotation constants live in
pkg/tracing:TimingEmittedAnnotationunder
delivery.tekton.dev(controller-internal),SpanContextAnnotationunder
tekton.dev(cross-service contract).See
docs/tracing.md.Assisted-by: Claude Code